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Abstract — We study the diffusion of information in an overlay- 
ing social-physical network. Specifically, we consider the following 
set-up: There is a physical information network where informa- 
tion spreads amongst people through conventional communica- 
tion media (e.g., face-to-face communication, phone calls), and 
conjoint to this physical network, there are online social networks 
where information spreads via web sites such as Facebook, 
Twitter, FriendFeed, YouTube, etc. We quantify the size and 
the critical threshold of information epidemics in this conjoint 
social-physical network by assuming that information diffuses 
according to the SIR epidemic model. One interesting finding is 
that even if there is no percolation in the individual networks, 
percolation (i.e., information epidemics) can take place in the 
conjoint social-physical network. We also show, both analytically 
and experimentally, that the fraction of individuals who receive 
an item of information (started from an arbitrary node) is 
significantly larger in the conjoint social-physical network case, 
as compared to the case where the networks are disjoint. These 
findings reveal that conjoining the physical network with online 
social networks can have a dramatic impact on the speed and 
scale of information diffusion. 

Key words: Information Diffusion, Coupled Social Net- 
works, Percolation Theory, Random Graphs. 



I. Introduction 



A. Motivation 



Modern society relies on basic physical network infrastruc- 
tures, such as power stations, telecommunication networks 
and transportation systems. Recently, due to advances in 
communication technologies and cyber-physical systems, these 
infrastructures have become increasingly dependent on one 
another and have emerged as interdependent networks 
One archetypal example of such coupled systems is the smart 
grid where the power stations and the communication network 
controlling them are coupled together. See the pioneering work 
of Buldyrev et al. [2| as well as [3|, |4|, [5|, [6| for a diverse 
set of models on coupled networks. 

Apart from physical infrastructure networks, coupling can 
also be observed between different types of social networks. 
Traditionally, people are tied together in a physical information 
network through old-fashioned communication media, such as 
face-to-face interactions. On the other hand, recent advances of 
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Internet and mobile communication technologies have enabled 
people to be connected more closely through online social 
networks. Indeed, people can now interact through e-mail or 
online chatting, or communicate through a Web 2.0 website 
such as Facebook, Twitter, FriendFeed, YouTube, etc. Clearly, 
the physical information network and online social networks 
are not completely separate since people may participate in 
two or more of these networks at the same time. For instance, 
a person can forward a message to his/her online friends 
via Facebook and Twitter upon receiving it from someone 
through face-to-face communication. As a result, the infor- 
mation spread in one network may trigger the propagation 
in another network, and may result in a possible cascade 
of information. One conjecture is that due to this coupling 
between the physical and online social networks, today's 
breaking news (and information in general) can spread at an 
unprecedented speed throughout the population, and this is the 
main subject of the current study. 

Information cascades over coupled networks can deeply 
influence the patterns of social behavior. Indeed, people have 
become increasingly aware of the fundamental role of the 
coupled social-physical networisQ as a medium for the spread 
of not only information, but also ideas and influence. Twitter 
has emerged as an ultra-fast source of news [7] and Facebook 
has attracted major businesses and politicians for advertising 
products or candidates. Several music groups or singers have 
gained international fame by uploading videos to YouTube. In 
almost all cases, a new video uploaded to YouTube, a rumor 
started in Facebook or Twitter, or a political movement adver- 
tised through online social networks, either dies out quickly 
or reaches a significant proportion of the population. In order 
to fully understand the extent to which these events happen, 
it is of great interest to consider the combined behavior of the 
physical information network and online social networks. 

B. Related Work 

Despite the fact that information diffusion has received a 
great deal of research interest from various disciplines for 
over a decade, there has been little study on the analysis of 
information diffusion across coupled networks; most of the 

'Throughout, we sometimes refer to the physical information network 
simply as the physical network, whereas we refer to online social networks 
simply as social networks. Hence the term coupled (or overlaying, or conjoint) 
social-physical network. 
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works consider information propagation only within a single 
network. The existing literature on this topic is much too broad 
to survey here, but we will attempt to cover the works that 
are most relevant to our study. To this end, existing studies 
can be roughly classified into two categories. The first type 
of studies 0, 0, Q0|, flj], 132), C3 are empirical and 
analyze various aspects of information diffusion using large- 
scale datasets from existing online social networks. Some of 
the interesting questions that have been raised (and answered) 
in these references include "What are the roles of behavioral 
properties of the individuals and the strength of their ties in the 
dynamics of information diffusion" ifTOl . ifTTI . "How do blogs 
influence each other?" lfl3ll . and "How does the topology of the 
underlying social network effect the spread of information?" 

mi. 

The second type of studies d, IB], (H, ED, (HI, lfl9l . 
[20 1 build mathematical models to analyze the mechanisms by 
which information diffuses across the population. These refer- 
ences study the spread of diseases (rather than information) in 
small- world networks 1171 . ifTHl . scale-free networks [19|, and 
networks with arbitrary degree distributions |20|. However, by 
the well-known analogy between the spread of diseases and 
information Ell . 11221 . 11231 . their results also apply in the 
context of information diffusion. Another notable work in this 
group is ll24l which studies the spread of rumors in a network 
with multiple communities. 

Setting aside the information diffusion problem, there has 
been some recent interest on various properties of coupled 
(or interacting or layered) networks (see 0, 11251 . (26), [27 1, 
[28 1). For instance, 0, ll25l and ll26l consider a layered 
network structure where the networks in distinct layers are 
composed of identical nodes. On the other hand, in [27 1, 
the authors studied the percolation problem in two interacting 
networks with completely disjoint vertex sets; their model is 
similar to interdependent networks introduced in [2|. Recently, 
[28 1 studied the susceptible-infectious-susceptible (SIS) epi- 
demic model in an interdependent network. 

C. Summary of Main Contributions 

The current paper belongs to the second type of studies in- 
troduced above and aims to develop a new theoretic framework 
towards understanding the characteristics of information dif- 
fusion across multiple coupled networks. Although empirical 
studies are valuable in their own right, the modeling approach 
adopted here reveals subtle relations between the network 
parameters and the dynamics of information diffusion, thereby 
allowing us to develop a fundamental understanding as to how 
conjoining multiple networks extends the scale of information 
diffusion. The interested reader is also referred to the article 
by Epstein [ 29 1 which discusses many benefits of building and 
studying mathematical models; see also 11301 . 

For illustration purposes, we give the definitions of our 
model in the context of an overlaying social-physical network. 
Specifically, there is a physical information network where in- 
formation spreads amongst people through conventional com- 
munication media (e.g., face-to-face communication, phone 
calls), and conjoint to this physical network, there are online 



social networks offering alternative platforms for information 
diffusion, such as Facebook, Twitter, YouTube, etc. In the 
interest of easy exposition, we focus on the case where there 
exists only one online social network along with the physical 
information network; see the Appendix for an extension to the 
multiple social networks case. We model the physical network 
and the social network as random graphs with specified 
degree distributions |3p . We assume that each individual in 
a population of size n is a member of the physical network, 
and becomes a member of the social network independently 
with a certain probability. It is also assumed that information 
is transmitted between two nodes (that are connected by a 
link in any one of the graphs) according to the susceptible- 
infectious-recovered (SIR) model; see Section [TT] for precise 
definitions. 

Our main findings can be outlined as follows: We show 
that the overlaying social-physical network exhibits a "critical 
point" above which information epidemics are possible; i.e., 
a single node can spread an item of information (a rumor, 
an advertisement, a video, etc.) to a positive fraction of indi- 
viduals in the asymptotic limit. Below this critical threshold, 
only small information outbreaks can occur and the fraction 
of informed individuals always tends to zero. We quantify 
the aforementioned critical point in terms of the degree 
distributions of the networks and the fraction of individuals 
that are members of the online social network. Further, we 
compute the probability that an information originating from 
an arbitrary individual will yield an epidemic along with the 
resulting fraction of individuals that are informed. Finally, in 
the cases where the fraction of informed individuals tend to 
zero (non-epidemic state), we compute the expected number 
of individuals that receive an information started from a single 
arbitrary node. 

These results are obtained by mapping the information dif- 
fusion process to an equivalent bond percolation problem [32| 
in the conjoint social-physical network, and then analyzing the 
phase transition properties of the corresponding random graph 
model. This problem is intricate since the relevant random 
graph model corresponds to a union of coupled random graphs, 
and the results obtained in ll20l . ll3~Tl for single networks fall 
short of characterizing its phase transition properties. To over- 
come these difficulties, we introduce a multi-type branching 
process and analyze it through an appropriate extension of the 
method of generating functions GUI . 

To validate our analytical results, we also perform extensive 
simulation experiments on synthetic networks that exhibit sim- 
ilar characteristics to some real-world networks. In particular, 
we verify our analysis on networks with power-law degree 
distributions with exponential cut-off and on Erdos-Renyi (ER) 
networks [33]; it has been shown [34] that many real networks, 
including the Internet, exhibit power-law distributions with 
exponential cut-off. We show that conjoining the networks 
can significantly increase the scale of information diffusion 
even with only one social network. To give a simple example, 
consider a physical information network W and an online 
social network F that are ER graphs with respective mean 
degrees A^, and A/, and assume that each node in W is a 
member of F independently with probability a. If X w = 0.6 
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and a — 0.2, we show that information epidemics are possible 
in the overlaying social-physical network H = WUF whenever 
Xf > 0.77. In stark contrast, this happens only if X w > 1 or 
A/ > 1 when the two networks are disjoint. Furthermore, 
in a single ER network W with X w = 1.5, an information 
item originating from an arbitrary individual gives rise to an 
epidemic with probability 0.58 (i.e., can reach at most 58% of 
the individuals). However, if the same network W is conjoined 
with an ER network F with a — 0.5 and A/ = 1.5, the 
probability of an epidemic becomes 0.82 (indicating that up to 
82% of the population can be influenced). These results show 
that the conjoint social-physical network can spread an item of 
information to a significantly larger fraction of the population 
as compared to the case where the two networks are disjoint. 

The above conclusions are predicated on the social net- 
work F containing a positive fraction of the population. This 
assumption is indeed realistic since more than 50% of the 
adult population in the US use Facebook [11]. However, for 
completeness we also analyze (see Section [V} the case where 
the social network F contains only |_n 7 J nodes with 7 < 1. In 
that case, we show analytically that no matter how connected 
F is, conjoining it to the physical network W does not change 
the threshold and the expected size of information epidemics. 

Our results provide a complete characterization of the infor- 
mation diffusion process in a coupled social-physical network, 
by revealing the relation between the network parameters and 
the most interesting quantities including the critical threshold, 
probability and expected size of information epidemics. To 
the best of our knowledge, there has been no work in the 
literature that studies the information diffusion in overlay 
networks whose vertices are neither identical nor disjoint. 
We believe that our findings along this line shed light on 
the understanding on information propagation across coupled 
social-physical networks. 

D. Notation and Conventions 

All limiting statements, including asymptotic equivalences, 
are understood with n going to infinity. The random vari- 
ables (rvs) under consideration are all defined on the same 
probability space (il, J 7 , P). Probabilistic statements are made 
with respect to this probability measure P, and we denote the 
corresponding expectation operator by E. The mean value of a 

st 

random variable k is denoted by < k >. We use the notation = 
to indicate distributional equality, -^—^ to indicate almost sure 
convergence and — > to indicate convergence in probability. 
For any discrete set S we write l^l for its cardinality. For a 
random graph Q we write Ct(Q) for the number of nodes in 
its ?'th largest connected component; i.e., C\(Q) stands for the 
size of the largest component, C<z{Q) for the size of the second 
largest component, etc. 

The indicator function of an event E is denoted by 1 [E]. We 
say that an event holds with high probability (whp) if it holds 
with probability 1 as n — > 00. For sequences {a n },{b n } : 
No — > R+, we write a n = o(b n ) as a shorthand for the relation 
linin^oo = 0, whereas a n = 0(b n ) means that there exists 
c > such that a n < cb n for all n sufficiently large. Also, 
we have a n = ft(b n ) if b n = 0(a n ), or equivalently, if there 



exists c > such that a n > cb n for all n sufficiently large. 
Finally, we write a n = if we have a n = 0(b n ) and 

a n = Q(b n ) at the same time. 

E. Organization of the Paper 

The rest of the paper is organized as follows. In Section 
Hfl we introduce a model for the overlaying social-physical 
network. Section [HI] summarizes the main results of the paper 
that deal with the critical point and the size of information 
epidemics. In section [TV] we illustrate the theoretical findings 
of the paper with numerical results and verify them via 
extensive simulations. In Section [V] we study information 
diffusion in an interesting case where only a sublinear fraction 
of individuals are members of the online social network. The 
proofs of the main results are provided in Sections [VT] and 
IVIII In the Appendix, we demonsrate an extension of the 
main results to the case where there are multiple online social 
networks. 

II. System Model 

A. Overlay Network Model 

We consider the following model for an overlaying social- 
physical network. Let W stand for the physical information 
network of human beings on the node set Af = {1, . . . , n}. 
Next, let F stand for an online social networking web site, e.g., 
Facebook. We assume that each node in Af is a member of this 
auxiliary network with probability a G (0, 1] independently 
from any other node. In other words, we let 

P [i € Afp] = a, i = l,...,n, (1) 

with Afp denoting the set of human beings that are members 
of Facebook. With this assumption, it is clear that the vertex 
set Afp of F satisfies 



by the law of large numbers (we consider the case where 
\Afp\ = o(n) separately in Section [VT>. 

We define the structure of the networks W and F through 
their respective degree distributions {p™} and {pi}- In partic- 
ular, we specify a degree distribution that gives the properly 
normalized probabilities {p™, k = 0, 1, . . .} that an arbitrary 
node in W has degree k. Then, we let each node i = 1, . . . , n 
in W = W(n; {p^}) have a random degree drawn from 
the distribution {p™} independently from any other node. 
Similarly, we assume that the degrees of all nodes in F = 
F(n; a, {p{}) are drawn independently from the distribution 
{p£, k = 0,1,...}. This corresponds to generating both 
networks (independently) according to the configuration model 
EH, f35j. In what follows, we shall assume that the degree 
distributions are well-behaved in the sense that all moments 
of arbitrary order are finite. 

In order to study information diffusion amongst human 
beings, a key step is to characterize an overlay network H 
that is constructed by taking the union of W and F. In other 
words, for any distinct pair of nodes i,j, we say that i and j 
are adjacent in the network H, denoted i ~g j, as long as at 
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least one of the conditions {i j} or {i j} holds. This 
is intuitive since a node i can forward information to another 
node j either by using old-fashioned communication channels 
(i.e., links in W) or by using Facebook (i.e., links in F). Of 
course, for the latter to be possible, both i and j should be 
Facebook users. 

The overlay network H = W U F constitutes an ensemble 
of the colored degree-driven random graphs proposed in |36|. 
Let {1,2} be the space of possible colors (or types) of edges 
in H; specifically, we say the edges in Facebook are of type 
1, while the edges in the physical network are said to be of 
type 2. The colored degree of a node i is then represented by 
an integer vector ef = [df,d l w ], where dj (resp. d l w ) stands 
for the number of Facebook edges (resp. physical connections) 
that are incident on node i. Under the given assumptions on 
the degree distributions of W and F, the colored degrees (i.e., 
d 1 , . . . , d n ) will be independent and identically distributed 
according to a colored degree distribution {pd} such that 

Pd= (ap f df + {I -a)l[d f = {)])■ p^ w , d = (d f ,d w ) (3) 

due to independence of F and W. The term (1 — a)l [df = 0] 
accommodates the possibility that a node is not a member of 
the online social network, in which case the number df of 
F-edges is automatically zero. 

Given that the colored degrees are picked such that Y^i=i ^/ 
and Y^7=i^w me even > we construct H as in 1361 . 1201 : Each 
node i = 1, . . . , n is first given the appropriate number df and 
d l w of stubs of type 1 and type 2, respectively. Then, pairs of 
these stubs that are of the same type are picked randomly and 
connected together to form complete edges; clearly, two stubs 
can be connected together only if they are of the same type. 
Pairing of stubs continues until none is left. 

B. Information Propagation Model 

Now, consider the diffusion of a piece of information in 
the overlay network H which starts from a single node. We 
assume that information spreads from a node to its neighbors 
according to the SIR epidemic model. In this context, an 
individual is either susceptible (S) meaning that she has not 
yet received a particular item of information, or infectious (I) 
meaning that she is aware of the information and is capable 
of spreading it to her contacts, or recovered (R) meaning 
that she is no longer spreading the information. This analogy 
between the spread of diseases and spread of information in 
a network has long been recognized [21] and SIR epidemic 
model is commonly used in similar studies; e.g., see ll22l 
(diffusion of worms in online social networks), [21] (diffusion 
of information through Blogs), and ll23l (diffusion of files in 
peer-to-peer file sharing networks), among others. 

The dynamics of information diffusion can now be de- 
scribed as in [20|: We assume that an infectious individual 
i transmits the information to a susceptible contact j with 
probability where 

Here, denotes the average rate of being in contact over 
the link from i to j, and Tj is the time i keeps spreading the 
information; i.e., the time it takes for i to become recovered. 



It is expected that the information propagates over the 
physical and social networks at different speeds, which mani- 
fests from different probabilities TJy across links in this case. 
Specifically, let T£j stand for the probability of information 
transmission over a link (between and i and j) in W and 
let T/j denote the probability of information transmission 
over a link in F. For simplicity, we assume that T£- and 
TV- are independent for all distinct pairs i,j = 1, ...,n. 
Furthermore, we assume that the random variables rf, and 
rf are independent and identically distributed (i.i.d.) with 
probability densities P w (r) and P w (t), respectively. In that 
case, it was shown in [20|, |37| that information propagates 
over W as if all transmission probabilities were equal to T w , 
where XL is the mean value of 77"; i.e., 

T w :=< T-j > = 1 — / / e~ rT P w {r)P w {r)drdT. 
Jo Jo 

We refer to T w as the transmissibility of the information over 
the physical network W and note that < T w < 1. In the same 
manner, we assume that r{j and r/ are i.i.d. with respective 
densities Pf(r) and P/(t) leading to a transmissibility Tf of 
information over the online social network F. 

Under these assumptions, information diffusion becomes 
equivalent to the bond percolation on the conjoint network 
H = WUF Il20l. 071 . More specifically, assume that each 
edge in W (resp. F) is occupied - meaning that it can 
be used in spreading the information - with probability T w 
(resp. Tf) independently from all other edges. Then, the size 
of an information outbreak started from an arbitrary node 
is equal to the number of individuals that can be reached 
from that initial node by using only the occupied links of 
H. Hence, the threshold and the size of information epi- 
demics can be computed by studying the phase transition 
properties of the random graph H(n; a, {p™}, T w , {pi}, Tf) — 
W(n;{p%},T w ) U F{n;a,{p{},T f ) which is obtained by 
taking a union of the occupied edges of W and F. More 
precisely, information epidemics can take place if and only if 
H(n; a, {p^}, T w , {p{}, Tf) has a giant connected component 
that contains a positive fraction of nodes in the large n limit. 
Also, an arbitrary node can trigger an information epidemic 
only if it belongs to the giant component, in which case an 
information started from that node will reach to all nodes 
in the giant component. Hence, the fractional size of the 
giant component in M.(n;a,{p^},T w ,{pj.},Tf) gives both 
the probability that an arbitrary node triggers an information 
epidemic as well as the corresponding fractional size of the 
information epidemic. 

III. Main Results 

A. Information Diffusion in Coupled Graphs with Arbitrary 
Degree Distributions 

We now present the main result of our paper characterizes 
the threshold and the size of the information epidemic in H 
by revealing its phase transition properties. First, for notational 
convenience, let kf and k w be random variables independently 
drawn from the distributions {p{} and {p^}, respectively, and 



5 



let < kf >:— Xf and < k w >:— X w . Further, assume that /3t 
and f3 w are given by 



< k) > -X f 



and (3 W := 



< k w > X w 



X f A,, 
and define the threshold function aj w by 



, (4) 



(5) 



TfPf + T w p w + y/(T f P f - T w p w y + 4aT f T w XfX w 

2 

Finally, let hi, h 2 in (0, 1] be given by the pointwise smallest 
solution of the recursive equations 



T 



It 



hi = 

h 2 = ^L 



V 

T 
A„ 



xh\ s + 1 - , 



V 



E 



k m -l 



1 - T„ ; 



(6) 
(7) 



Theorem 3.1: Under the assumptions just stated, we have 

(i) If (Jf w < 1 then with high probability 
the size of the largest component satisfies 
Ci{m{n;a,{pl},T w ,{p{},Tf)) = o(n). 

On the other hand, if <j*f w > 1, then 
Ci (u{n;a,{p™},T w ,{p f k },Tf)) = Q(n) whp. 

(a) Also, 



^Ci(w(n;a,{pthT w ,{P f khT f )) 



A 1 — E ahV + 1 - a E h 



k 



(8) 



A proof of Theorem 13.11 is given in Section |VT] 
Theorem 13.11 quantifies the fraction of individuals in the 
overlaying social-physical network that are likely to receive 
an item of information which starts spreading from a single 
individual. Specifically, Theorem 13.11 shows that the critical 
point of the information epidemic is marked by a*f w = 1, with 
the critical threshold a*f w given by ©. In other words, for any 
parameter set that yields a*f w > 1 (supercritical regime), an 
item of information has a positive probability of giving rise to 
an information epidemic; i.e., reaching a linear fraction of the 
individuals. In that case, the probability of a node triggering 
an information epidemic, and the corresponding asymptotic 
fraction of individuals who receive the information can be 
found by first solving the recursive equations (|6]l-(|7]i for the 
smallest hi,h 2 in (0,1] and then computing the expression 
given in dS). On the other hand, whenever it holds that a* fw < 
1 (subcritical regime), we conclude from Theorem 13.11 that 
the number of individuals who receive the information will 
be o(n) with high probability, meaning that all information 
outbreaks are non-epidemic. 

It is of interest to state whether or not Theorem 13. II can be 
deduced from the phase transition results for random graphs 
with arbitrary degree distributions (e.g., see ll35l . Il20l . OP ). 
It is well known 1351 that for these graphs the critical point 
of the phase transition is given by 



E[dj(dj-1)] 
E[di] 



1 



where dj is the degree of an arbitrary node. We next show 
that this condition is not equivalent (and, indeed is not even a 
good approximation) to ai w = 1. 

To this end, we consider a basic scenario where E and W are 
both Erdos-Renyi graphs ll33l so that their degree distributions 

fe! 

e^"*-^. Uiven mat eacn linlc in F (resp. in w) is 
occupied with probability Tf (resp. T w ), the occupied degree 
of an arbitrary node i in H follows a Poisson distribution 
with mean T W X W if i £ Afp (which happens with probability 
1 — a), and it follows a Poisson distribution with mean 
Tf^-f + T W X W — T/A/ J™ Am if i G Np (which happens with 
probability a). When n becomes large this leads to 



are (asymptotically) Poisson, i.e., we have p% = 
and pj, = e _A ^^f. Given that each link in E (resp. in 



E [di(di - 1)] _ a{T f Xf + T W X W ) 2 + (1 - a)(T w X w ) 2 



E[di 



aTfXf + T W X 

■a 



(9) 



It can be seen that the above expression is not equal to the 
corresponding quantity a*f w - As discussed in the next subsec- 



tion, for the given degree distributions we have 
where X* fw is given by ( TBi i. For instance, with 



X }w> 



T W X W = 0.6 and T f X f = 0.8, we have a) w - ~ fw 



0.2, 

X fw = 1.03 

while (O yields 0.89 signaling a significant difference between 
the exact threshold X*f w and the approximation given by (|9]l. 
We conclude that the results established in Theorem 13.11 (for 
coupled random graphs) go beyond the classical results for 
single random graphs with arbitrary degree distributions. 

Aside from the critical threshold and the fractional size of 
information epidemics, we are also interested in computing 
the average size of information outbreaks in the subcritical 
regime for a fuller understanding of information propagation 
process. In other words, in the case where the fraction of 
informed individuals tends to zero, we wish to compute the 
expected number of informed nodes. For a given network with 
nodes 1, . . . , n, the average outbreak size < s > is given by 
Y^i=x n s (^' where s(i) is the number of nodes that receive 
an information started from node i; i.e., s(i) is the size of the 
largest connected component containing node i. 

Now, let < s >:=< s(n;a,{p%},T w ,{p f k },T f )) > denote 
the average outbreak size in H(n; a, {p^}, T w , {p{}, Tf). It is 
easy to check that 



< s > = 



E 



±-(c 3 (U(n;a,{ P n,T w ,{p{},Tf)) 



(10) 



where, as before, Cj gives the size of the jth largest com- 
ponent of the network, and N c denotes the total number of 
components. To see ( fTUl ), observe that an arbitrarily selected 
node will belong to a component of size Cj with probability 
Cj/n, in which case an information started from that particular 
node will create an outbreak of size Cj. Summing over all 
components of the network, we get ( fTOb . In the supercritical 
regime, we have Ci(H) = il(n) so that < s >— > oo. The 
next result, established in Section I VII allows computing this 
quantity in the subcritical regime. 

Theorem 3.2: Leta*f w < 1. With the above assumptions, let 
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Si, s 2 denote the simultaneous stable solution of the equations 

si = T f + /3fTfSi + X w T f s 2 (11) 
s 2 = T w + aXfT w S! + (3 w T w s 2 (12) 

Then, the average outbreak size satisfies 

<s> A 1 + aXfSi + X W S2- (13) 



B. Special Case: Information Diffusion in coupled ER graphs 

A special case of interest is when both W and F are Erdos- 
Renyi graphs [33 1. More specifically, let W = W(n; X w /n) be 
an ER network on the vertices {1, . . . , n} such that there exists 
an edge between any pair of distinct nodes i, j = 1, . . . , n with 
probability X w /n; this ensures that mean degree of each node 
is asymptotically equal to A^ . Next, obtain a set of vertices A/"f 
by picking each node 1, . . . , n independently with probability 
a G (0,1]. Now, let F = F(n; a, Xf/(an)) be an ER graph 
on the vertex set Nf with edge probability given by — . The 
mean degree of a node in F is given (asymptotically) by A/ 
as seen via (ffji. 

Given that the degree distributions are asymptotically Pois- 
son in ER graphs, this special case is covered by our model 
presented in Section IH-AI by setting p^ = e~ A ™ -# and 

p s k — e~ A/ -£f- Thus, Theorem 13.11 is still valid and can be 
used to obtain the condition and expected size of information 
epidemics. However, recent developments on inhomogeneous 
random graphs |38| enable us to obtain more detailed results 
than those given by Theorem 13. II for this special case. 

Consider now an overlay network model H constructed on 
the vertices 1, . . . , n by conjoining the occupied edges of W 
and F, i.e., we have H(n; a, T w X w ,T f X f ) = W(n; T w X w /n)U 
¥(n;a,TfXf/{an)). Let X* fw be defined by 

A}™ := \{T f X f +T w X w ) (14) 
+ \\/(TfX f +T w X w ) 2 - 4(1 - a)T f X f T w X w . 

Also, let pi , p2 be the pointwise largest solution of the 
recursive equations 

pi = 1 - exp {- p 1 (aX lu T w + X f T } ) - p 2 (l - a)X w T w } 
p 2 = l- exp {-pxaX w T w - p 2 (l - a)X w T w } 

(15) 

with pi,p 2 in [0, 1]. 

Theorem 3.3: With the above assumptions, we have 

(i) If Xj: w < 1, then with high probability, 
the size of the largest component satisfies 
Ci(M(n;a,T w X w ,T f Xf)) = O(logn); in contrast, 
ifX* fw > 1 we have C 1 (M{n- 1 a,T w X Wl T f X f )) = 0(n) 
whp, while the size of the second largest component 
satisfies C 2 (W(n;a,T w X w ,T f Xf)) = O(logn). 
(ii) Moreover, 

1 P 

-Ci(M(n;a,T w X w ,TfXf)) ->■ api + (1 - a)p 2 . 

A proof of Theorem 13.31 is given in Section I VIII 
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Fig. 1. The minimum ^fTj required for existence of a giant component 
in H(n; a, T W X W , Tf\f) versus \ W T W for various a values. In other words, 
each curve corresponds to the boundary of the phase transition for the cor- 
responding a value. Above the boundary there exists a giant component, but 
below it all components have 0(log n) nodes. 



Theorem 13.31 is a counter-part of Theorem 13.11 This time, 
the "critical point" of the information epidemic is marked 
by Xf w = 1, with the critical threshold XJ W given by (TT4l) . 
With p^ = e _A ™A^/fc! and p{ = e^ x fX k f /k\, we have that 
j3f = Xf, W = X w , and it is easy to check that cri w = X^ w 
so that part (i) of Theorem 13.31 is compatible with part (i) 
of Theorem 13.11 Also, we find (numerically) that the second 
parts of Theorems 13.31 and 13 . 1 1 yield the same asymptotic giant 
component size. Nevertheless, it is worth noting that Theorem 
13 .31 is not a corollary of Theorem [34] This is because, through 
a different technique used in the proofs, Theorem l3.3l provides 
the sharper bounds Ci(M(n;a,T w X w ,TfXf) — O(logn) 
(subcritical case) and C2(H(n; a, T W X W , TfXf) = O(logn) 
(supercritical case) that go beyond Theorem 13. II 

We observe that the threshold function A^ is symmetric in 
Tf Xf and T W X W , meaning that both networks have identical 
roles in carrying the conjoined network to the supercritical 
regime where information can reach a linear fraction of the 
nodes. To get a more concrete sense, we depict in Figure 
[U the minimum A/T/ required to have a giant component 
in M(n;a,T w X w ,TfXf) versus X W T W for various a values. 
Each curve in the figure corresponds to a phase transition 
boundary above which information epidemics are possible. If 
Tf = T w = 1, the same plot shows the boundary of the 
giant component existence with respect to the mean degrees 
Xf and X w . This clearly shows how two networks that are in 
the subcritical regime can yield an information epidemic when 
they are conjoined. For instance, we see that for a = 0.1, it 
suffices to have Xf = X w = 0.76 for the existence of an 
information epidemic. Yet, if the two networks were disjoint, 
it would be necessary |[33l to have A/ > 1 and X w > 1. 

We elaborate further on Theorem 13.31 First, we note from 
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the classical results [33 1 that ER graphs have a giant com- 
ponent whenever average node degree exceeds one. This is 
compatible with part (i) of Theorem 13.31 since the condition 
for giant component existence reduces to TjA/ > 1 if 
T W X W = and T W X W > 1 when TfXf = 0. Finally, in 
the case where a = 1 (i.e., when everyone in the population 
is a member of Facebook), the graph H reduces to an ER 

T f A f +T m A m LJU- 



graph with edge probability 



leading 



to a mean node degree of TV A/ 4- T W X W in the asymptotic 
regime. As expected, for the case a = 1, Theorem 13.31 
reduces to classical results for ER graphs as we see that 

X *fw = T f X f + T wK and iCi(H) 4 p x where Pl is the 
largest solution of pi = 1 - e -w( T / A /+ T ™ A ™). 

IV. Numerical Results 

A. ER Networks 

We first study the case where both the physical information 
network W and the online social network F are Erdos-Renyi 
graphs. As in Section IIII-BI let H(n; a, T W X W , TfXf) = 
W(n;T w X w /n) U F(n; a, TfXf /(an)) be the conjoint social- 
physical network, where W is defined on the vertices 
{1, . . . , n), whereas the vertex set of F is obtained by picking 
each node l,...,n independently with probability a. The 
information transmissibilities are equal to T w and Tf in 
W and F, respectively, so that the mean degrees are given 
(asymptotically) by T W X W and TfXf, respectively. 

We plot in Figure [2] the fractional size of the giant com- 
ponent in M(n;a,T w X w ,TfXf) versus TfXf = T W X W for 
various a values. In other words, the plots illustrate the largest 
fraction of individuals that a particular item of information 
can reach. In this figure, the curves stand for the analytical 
results obtained by Theorem 13.31 whereas marked points stand 
for the experimental results obtained with n = 2 x 10 5 
nodes by averaging 200 experiments for each data point. 
Clearly, there is an excellent match between the theoretical 
and experimental results. It is also seen that the critical 
threshold for the existence of a giant component (i.e., an 
information epidemic) is given by TfXf = T W X W = 0.760 
when a = 0.1, TfXf = T W X W = 0.586 when a = 0.5, and 
TfXf — T W X W = 0.514 when a — 0.9. It is easy to check 
that these values are in perfect agreement with the theoretically 
obtained critical threshold X** w given by ([Pil l. 

In the inset of Figure [2] we demonstrate the average 
outbreak size < s > versus TfXf — T W X W under the same 
setting. Namely, the curves stand for the analytical results 
obtained from Theorem 13.21 while the marked points are 
obtained by averaging the quantity given in (TlOb over 200 
independent experiments. We see that experimental results are 
in excellent agreement with our analytical results. Also, as 
expected, average outbreak size < s > is seen to grow un- 
boundedly as TfXf = T W X W approaches to the corresponding 
epidemic threshold. 

B. Networks with Power Degree Distributions 

In order to gain more insight about the consequences of 
Theorem 13.11 for real-world networks, we now consider a 
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Fig. 2. Tiie fractional size of the giant component in H(n; a, T w X w ,Tf\f) 
versus TfXf = T w X w . The curves correspond to analytical results obtained 
from Theorem \3.3\ whereas marked points stand for the experimental results 
obtained with n = 2 X 10 5 by averaging 200 experiments for each point. 
(Inset) Average out-break size < s > versus TfXf = T W X W under the same 
setting. 



specific example of information diffusion when the physical 
information network W and the online social network F 
have power-law degree distributions with exponential cutoff. 
Specifically, we let 



Pi 



and 



/ 
Pk 







(Li^e- 1 /^)) -V7. e -*/r« 


(Li 7/ (e- 1 /r/))- 1 A : -7/ e -fe/r/ 



if k = 
if k = 1,2, 



(16) 



if k 
if k 





1,2,. 



(17) 

where j w , 7/, T w and T/ are positive constants and the 
normalizing constant Li m (z) is the mth polylogarithm of z\ 

i.e., U m (z) =E£Li|£- 

Power law distributions with exponential cutoff are chosen 
here because they are applied to a variety of real-world 
networks [20], [27 1 . In fact, a detailed empirical study on the 
degree distributions of real-world networks |[34l revealed that 
the Internet (at the level of autonomous systems), the phone 
call network, the e-mail network, and the web link network all 
exhibit power law degree distributions with exponential cutoff. 

To apply Theorem 13.11 we first compute the epidemic 
threshold given by ©. Under (fl6b-([T7b we find that 



A, 



Li 
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ff 



Li 
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(e-Vr/) : 
2{e- 1/T ') 



Li 



7/" 



L(e- 



-1/T, 



Similar expressions can be derived for X w and (3 W . It is now a 
simple matter to compute the critical threshold a^ w from © 
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Fig. 3. The minimum Tf required for the existence of a giant component in 
H(n; a, {p%}, T w ,{p{},T f ) versus T w . The distributions {p%} and {p{} 
are given by (16j and tl7t , with yj = y w = 2.5 and V 



S 



10. 



The Tf and T w values are multiplied by the corresponding pt and (3 W values 
to provide a fair comparison with the disjoint network case; under the current 
setting we have /3 f = W = 1.545. 



using the above relations. Then, we can use Theorem 13. U i) 
to check whether or not an item of information can reach a 
linear fraction of individuals in the conjoint social-physical 
network H(n; a, {pt},T w ,{p f k },T f ) = W(n; {p^},T w ) U 
F(n;a,{p{},T f ). 

To that end, we depict in Figure [3] the minimum 
Tf value required to have a giant component in 
H(n; a, {p¥!}, T w , {p{.}, Tf) versus T w , for various a 
values. In other words, each curve corresponds to a phase 
transition boundary above which information epidemics are 
possible, in the sense that an information has a positive 
probability of reaching out to a linear fraction of individuals 
in the overlaying social-physical network. In all plots, we 
set 7/ = 7„ = 2.5 and Tf = T w = 10. The Tf and T w 
values are multiplied by the corresponding /3f and (3 W values 
to make a fair comparison with the disjoint network case 
where it is required ll20l to have f3 w T w > 1 (or PfTf > 1) 
for the existence of an epidemic; under the current setting we 
have fif = (3 W — 1.545. Figure [3] illustrates how conjoining 
two networks can speed up the information diffusion. It 
can be seen that even for small a values, two networks, 
albeit having no giant component individually, can yield 
an information epidemic when they are conjoined. As an 
example, we see that for a = 0.1, it suffices to have that 
PfTf = f3 w T w — 0.774 for the existence of an information 
epidemic in the conjoint network H, whereas if the networks 
W and F are disjoint, an information epidemic can occur 
only if /3 W T W > 1 or /3 f T f > 1. 



Fig. 4. The fractional size of the giant component in 

W(n;a,{p™},T w ,{p f k },T f ) versus T;j3 } = T W /3 W . The distributions 
{p™} and are given by U6t and 1771 ), with 'yj = 7™ = 2.5 and 

T/ = = 10. The Tf and T w values are multiplied by the corresponding 
/3f and f) w values for fair comparison with the disjoint network case; under 
the current setting we hav e /3f = (3 W = 1.545. The curves were obtained 
analytically via Theorem 13.71 whereas the marked points stand for the 
experimental results obtained with n = 2 X 10 5 nodes by averaging 200 
experiments for each parameter set. We see that there is an excellent agreement 
between theory and experiments. (Inset) Average out-break size < s > versus 
PfTf = f) w T w under the same setting. 



We note that 
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and similar expressions can be derived for 



E[/ifp] and 
parameters, 



Next, we turn to computation of the giant component size. 



E[k w h 2 w ~ ]. Now, for any given set of 
Jf,Jw> Tf, T W ,T f,T w , a, we can numerically obtain the giant 
component size of H(n; a, {p^}, T w , {p{,}, Tf ) by invoking 
the above relations into part (ii) of Theorem 13. II 

To this end, Figure [4] depicts the fractional size of the giant 
component in H(n; a, {p™}, T w , {p{}, Tf) versus Tf(3f = 
T w f3 w , for various a values; as before, we set 7/ = j w = 2.5 
and Tf = T w = 10 yielding /?/ = f3 w = 1.545. In other 
words, the plots stand for the largest fraction of individuals in 
the social-physical network who receive an information item 
that has started spreading from a single individual. In Figure |4] 
the curves were obtained analytically via Theorem B . 1 1 whereas 
the marked points stand for the experimental results obtained 
with n = 2x 10 5 nodes by averaging 200 experiments for each 
parameter set. We see that there is an excellent agreement 
between theory and experiment. Moreover, according to the 
experiments, the critical threshold for the existence of a 
giant component (i.e., an information epidemic) appears at 
TftSf = T W (3 W = 0.78 when a = 0.1, T f (3 f = T W /3 W = 0.61 
when a = 0.5, and T f /3 f = T W /3 W = 0.53 when a = 0.9. 
These values are in perfect agreement with the theoretically 
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obtained critical threshold &i w given by Q. 

The inset of Figure [4] shows the average outbreak size < 
s > versus /3/Tf = (3 W T W under the same setting. To avoid 
the finite size effect (observed by Newman [20 1 as well) near 
the epidemic threshold, we have increased the network size 
up to n = 30 X 10 6 to obtain a better fit. Again, we see 
that experimental results (obtained by averaging the quantity 
<!~L0b over 200 independent experiments) agree well with the 
analytical results of Theorem 13.21 



V. Online Social Networks with o{n) nodes 

Until now, we have assumed that apart from the physical 
network W on n nodes, information can spread over an 
online social network which has fl(n) members. However, one 
may also wonder as to what would happen if the number of 
nodes in the online network is a sub-linear fraction of n. For 
instance, consider an online social network F whose vertices 
are selected by picking each node 1, . . . ,n with probability 
n 7_1 where < 7 < 1. This would yield a vertex set Mf 
that satisfies 

\M F \ <n 7 (l + e) (18) 

with high probability for any e > 0. We now show that, 
asymptotically, social networks with n 1 nodes have almost 
no effect in spreading information. We start by establishing an 
upper bound on the size of the giant component in H = WUF. 

Proposition 5.1: Let W be a graph on vertices 1, . . . , n, and 
F be a graph on the vertex set Mf C {1, . . . , n}. With H = 
W U F, we have 



Ci(H) < Ci(W) + C7 2 (W)(|A/>| - 1), 



(19) 



where Ci(W) and C*2(W) are sizes of the first and second 
largest components of W, respectively. 

Proof: It is clear that Ci(H) will take its largest value 
when F is a fully connected graph; i.e., a graph with edges 
between every pair of vertices. In that case the largest com- 
ponent of H can be obtained by taking a union of the largest 
components of W that can be reached from the nodes in Mf- 
With C| (W) denoting set of nodes in the largest component 
(of W) that can be reached from node i, we have 

Ci (H) = I U^ F C\ (W) I < Ci (W) + C 2 (W) + . . . C Wf 1 (W) 

(20) 

where Cj (W) stands for the jth largest component of W. The 
inequality ( 120b is easy to see once we write 



|u ieJV >ci(w)i = y, 



cf F{l) (W)-\Jcf FU) (W) 



3=1 



where Afp(i) is the ith element of Mf. The above quantity is a 
summation of the sizes of \M F \ mutually disjoint components 
of W. As a result, this summation can be no larger than the 
sum of the first \Mf | largest components of W. The desired 
conclusion < fT9b is now immediate as we note that C 2 (W) > 
Cj(W) for all j = 3,...,A/>. ■ 
The next result is an easy consequence of Proposition 15.11 
and classical results ll33ll for ER graphs. 



Corollary 5.1: Let W be an ER graph on the vertices 
1, . . . , n and let F be a graph whose vertex set Mf satisfies if78l) 
whp. The followings hold for H = W U F: 
(i) IfW is in the subcritical regime (i.e., if Ci(W) = o(n)), 

then whp we have C\ (H) = o(n). 
(m) JfCi(W) = 8(n), then we have 

Ci(H) = (l + o(l))Ci(W). 

Proof: It is known 11331 that for an ER graph W, it 
either holds that Ci(W) = O(logn) (subcritical regime) or 
it is the case that Ci(W) = 6(n) while C 2 (W) = O(logn) 
(supercritical regime). Under condition ( [T8"l ). we see from (TT~9b 
that whenever Ci(W) = o(n) we have Ci(H) < clogn-n 7 for 
some c > and part (i) follows immediately. Next, assume 
that we have Ci(W) = 0(n). The claim (ii) follows from 
( fT9b as we note that 

C 2 (W)-n 7 

™ Ci(W) 

since C 2 (W) = O (log n). ■ 
Corollary 15.11 shows that in the case where only a sub- 
linear fraction of the population use online social networks, 
an information item originating at a particular node can reach 
a positive fraction of individuals if and only if information 
epidemics are already possible in the physical information 
network. Moreover, we see from Corollary 15.11 that the frac- 
tional size of a possible information epidemic in the conjoint 
social-physical network is the same as that of the physical 
information network alone. Combining these, we conclude that 
online social networks with n 1 (0 < 7 < 1) members have 
no effect on the (asymptotic) fraction of individuals that can 
be influenced by an information item in the conjoint social- 
physical network. 

Along the same line as in Corollary 15.11 we next turn 
our attention to random graphs W with arbitrary degree 
distribution PP . Il35ll . This time we rely on the results by 
Molloy and Reed [35 Theorem 1] who have shown that if 
there exists some e > such that 



max{di, % = 1, . . . , n} < n 4 " 



(21) 



then in the supercritical regime (i.e., when Ci(W) = 0(n)) 
we have C 2 (W) = O(logn). It was also shown [35| that in the 
subcritical regime of the phase transition, we have Ci(W) = 
0(w(n) 2 logn) whenever 



max {di,i — 1, . . . , n} < w(n) 



(22) 



with w(n) < 7i8 _e for some e > 0. 

Now, consider a graph F whose vertex set Mf satisfies 
( [T8l whp and let W be a random graph with a given degree 
sequence {c?i}™ =1 satisfying (fJTJ. In view of ( fT9l ), it is easy 
to see that 

Ci(H) = (l + o(l))Ci(W) 

where i = WUF and this provides an analog of Corollary 
I5.ir ii). It is also immediate that in the subcritical regime, we 
have 

Ci(H) = o(n) 
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as long as ( l22l is satisfied and ( fl~8T > holds for some 7 < §; 
this establishes an analog of Corollary I5.1f i) for graphs with 
arbitrary degree distributions. 

The above result takes a simpler form for classes of random 
graphs W studied in Section HV-BI i.e., random graphs where 
the degrees follow a power-law distribution with exponential 
cutoff. In particular, let the degrees of W be distributed 
according to (fT6] l. It is easy to see that 



max{<ii 



1. 



,n} — O(logn) 



with high probability so that conditions (f2Tb and d22l) are 
readily satisfied. For the latter condition, it suffices to take 
w n — O(logn) so that in the subcritical regime, we have 

C*i(W) = 0((logn) 3 ). 

The next corollary is now an immediate consequence of 
Proposition 15.11 

Corollary 5.2: Let W be a random graph whose degrees 
follow the distribution specified in 1761 ) and let F be a graph 
whose vertex set Mf satisfies O whp. The followings hold 
forM = WUF: 

(i) If Ci(W) = o(n), then whp we have C X (H) = o{n). 
(ii) ffCi(W) = 0(n), then we have 

Ci(H) = (l + o(l))Ci(W). 



VI. Proofs of Theorem I3.1I and Theorem I3.2I 

Consider random graphs W(n, {p]?}) and F(n; a, as 
in Section ITlI-AI In order to study the diffusion of information 
in the overlay network H = W U F, we consider a branching 
process which starts by giving a piece of information to 
an arbitrary node, and then recursively reveals the largest 
number of nodes that are reached and informed by exploring 
its neighbors. We remind that information propagates from a 
node to each of its neighbors independently with probability 
Tf through links in F (type-1) and with probability T w 
through links in W (type-2). In the following, we utilize 
the standard approach on generating functions [31], [20|, and 
determine the condition for the existence of a giant informed 
component as well as the final expected size of the information 
epidemics. This approach is valid long as the initial stages 
of the branching process is locally tree-like, which holds in 
this case as the clustering coefficient of colored degree-driven 
networks scales like 1/n as 11 gets large 11391 . 

We now solve for the survival probability of the aforemen- 
tioned branching process by using the mean-field approach 
based on the generating functions ll3~Tl . Il20l . Let h%(x) (resp. 
/12 (x)) denote the generating functions for the finite number 
of nodes reached and informed by following a type-1 (resp. 
type-2) edge in the above branching process. In other words, 
we let h\{x) — 2~2 v mX m where v m is the "probability that 
an arbitrary type-1 link leads to a. finite informed component 
of size m"; h,2(x) is defined analogously for type-2 links. 
Finally, we let H(x) define the generating function for the 
finite number of nodes that receive an information started from 
an arbitrary node. 



We start by deriving the recursive relations governing h\(x) 
and h,2{x). We find that the generating functions h\{x) and 
li2(x) satisfy the self-consistency equations 

\ - dfp d 



hi(x) 



h 2 ( x ) = %Y1 



* <d f> 



uPd 



T f h 1 {x) d *-^h 2 (x) d ™ + (1 - T f ) (23) 



< d„, > 



T w hx{x) d fh 2 [x) 



<L»-l 



(1-T W ) (24) 



The validity of (1231 can be seen as follows: The explicit 
factor x accounts for the initial vertex that is arrived at. The 
factor dfpd/ < df > gives l20l the normalized probability 
that an edge of type 1 is attached (at the other end) to a vertex 
with colored degree d = (df,d w ). Since the arrived node is 
reached by a type-1 link, it will receive the information with 
probability Tf. If the arrived node receives the information, 
it will be added to the component of informed nodes and 
it can inform other nodes via its remaining df — 1 links of 
type-1 and d w links of type-2. Since the number of nodes 
reached and informed by each of its type-1 (resp. type- 
2) links is generated in turn by h\{x) (resp. /i2(x)) we 
obtain the term hi(x) df ~ 1 h,2(x) dw by the powers property 
of generating functions [31 1, [20|. Averaging over all possible 
colored degrees d gives the first term in d23l . The second 
term with the factor x° = 1 accounts for the possibility that 
the arrived node does not receive the information and thus is 
not included in the cluster of informed nodes. The relation 
(124-b can be validated via similar arguments. 

Using the relations d23l-(l24]i. we now find the finite num- 
ber of nodes reached and informed by the above branching 
process. We have that 



H{x) = xJ2Pdhi{x) d fh 2 {2 



(25) 



Similar to (I23l- (l24l i. the relation d25l l can be seen as follows: 
The factor x corresponds to the initial node that is selected 
arbitrarily and given a piece of information. The selected 
node has colored degree d — (df,d w ) with probability pd- 
The number of nodes it reaches and informs via each of its 
df (resp. d w ) links of type 1 (resp. type 2) is generated by 
hi(x) (resp. fi2(x)). This yields the term hi(x) df h,2(x) w and 
averaging over all possible colored degrees, we get (f25t . 

We are interested in the solution of the recursive relations 
(I23l-(l24l> for the case x = 1. This case exhibits a trivial fixed 
point hi(l) = /i2(l) = 1 which yields H(l) = 1 meaning 
that the underlying branching process is in the subcritical 
regime and that all informed components have finite size as 
understood from the conservation of probability. However, 
the fixed point hi(l) — ^-2(1) = 1 corresponds to the 
physical solution only if it is an attractor; i.e., a stable 
solution to the recursion (I23l- d24l >. The stability of this fixed 
point can be checked via linearization of d23l- (l24l i around 
hi(l) = /i2(l) = 1, which yields the Jacobian matrix J given 

^ = ffe§Lm=wi)=i for ^' = X ' 2 - This 8 ives 



T f <{d 2 f -d f )> 

<d f > 



Tf<d f d m > 



<dj> 



T w <d w d f > T m <{dl 



<d m > 



<d m > 



(26) 
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If all the eigenvalues of J are less than one in absolute value 
(i.e., if the spectral radius <j(J) of J satisfies u{J) < 1), 
then the solution /ii(l) = h 2 (l) = 1 is an attractor and 
H(l) = 1 becomes the physical solution, meaning that 
H(n; a, {p^}, T w , {pj,}, T/) does not possess a giant compo- 
nent whp. In that case, the fraction of nodes that receive the 
information tends to zero in the limit n — >• oo. On the other 
hand, if the spectral radius of J is larger than one, then the 
fixed point hi(l) = h 2 (l) = 1 is unstable pointing out that the 
asymptotic branching process is supercritical, with a positive 
probability of producing infinite trees. In that case, a nontrivial 
fixed point exists and becomes the attractor of the recursions 
<f23]l-(|2IK yielding a solution with hi(l), h 2 (l) < 1. In 
view of ( 1251 ) this implies H(l) < 1 and the corresponding 
probability deficit 1 — H (1) is attributed to the existence of 
a giant (infinite) component of informed nodes. In fact, the 
quantity 1 — H(\) is equal to the probability that a randomly 
chosen vertex belongs to the giant component, which contains 
asymptotically a fraction 1 — H{\) of the vertices. 

Collecting these Theorem 13.11 is now within easy reach. 
First, recall that kf and k w are random variables independently 
drawn from the distributions {p{} and {p%}> respectively, so 
that df is a random variable that is statistically equivalent to 
kf with probability a, and equal to zero otherwise. On the 
other hand, we have k w = d w . Using in d26l i, we now get 



J = 



T S Pf T f \ u 
T w aXf T W /3 U 



by the independence of df and d w . It is now a simple matter 
to see that a (J) = &} w , where <r* fw is as defined in ©. 



Therefore, we have established that the epidemic threshold is 
given by aj w = 1, and part (i) of Theorem 13.11 follows. 

Next, we set x = 1 in the recursive relations (f23l>-(l24b and 
let hi := hi(l) and h 2 ■= h 2 (l). Using <[3j and elementary 
algebra, we find that the stable solution of the recursions j23l - 
( f24T > is given by the smallest solution of ©-(III with hi, h 2 in 
(0, 1]. It is also easy to check from ( 1251 ) that 



H{1) =E 



ah? 



1 



x E 



and part (ii) of Theorem 13.11 follows upon recalling that the 
fractional size of the giant component (i.e., the number of 
informed nodes) is given by 1 — H(l) whp. ■ 
We now turn to proving Theorem 13.21 In the subcritical 
regime, H(x) corresponds to generating function for the 
distribution of outbreak sizes; i.e, distribution of the number 
of nodes that receive an information started from an arbitrary 
node. Therefore, the mean outbreak size is given by the first 
derivative of H(x) at the point x = 1. Namely, we have 



< s >= 



dH{x) 



dx 



~H'(1) 



(27) 



Recalling that hi(l) = h 2 (l) = 1 in the subcritical regime, 
we get from d25T l that 



H'(l) = l + h'i(l)Y,Pddf + h' 2 (l)J2Pdd u 

d d 

= 1 + aXfh'^l) + X w h' 2 (l), 



The derivatives h'i(l) and h' 2 (l) can also be computed 
recursively using the relations (|23l-(l24li. In fact, it is easy to 
check that 

fti(l) - T f +T f (3 f h'i(l)+T f X w h' 2 (l) (29) 
h' 2 (l) = T w +T w aX f h'i{l)+T w f3 w h' 2 {l) (30) 

upon using © and other definitions introduced previously. 
Now, setting si := h[(l) and s 2 := h' 2 (l), we obtain (fTTl i- (fT2l 
from (l29l-(l30ll. and Theorem 13.21 follows upon substituting 
into (f27]i. ■ 

VII. Proof of Theorem [373] 



In this section, we give a proof Theorem 13.31 First, we 
summarize the technical tools that will be used. 

A. Inhomogeneous Random Graphs 

Recently, Bollobas, Janson and Riordan |38ll have devel- 
oped a new theory of inhomogeneous random graphs that 
would allow studying phase transition properties of complex 
networks in a rigorous fashion. The authors in 11381 established 
very general results for various properties of these models, 
including the critical point of their phase transition, as well as 
the size of their giant component. Here, we summarize these 
tools with focus on the results used in this paper. 

At the outset, assume that a graph is defined on vertices 
{l,...,n}, where each vertex i is assigned randomly or 
deterministically a point xi in a metric space S. Assume 
that the metric space S is equipped with a Borel probability 
measure /i such that for any /i-continuity set A C S (see [38 1) 



1 " 

-.J2i[ Xi eA] ^n(A). 

ii — ' 



(31) 



i=l 



A vertex space V is then defined as a triple 
(S, fi, {xi, . . . , x n }) where {xi,...,x n } is a sequence 
of points in S satisfying (fJTJ. 

Next, let a kernel k on the space (S, fi) define a symmetric, 
non-negative, measurable function on Sx S. The random graph 
G v (n,n) on the vertices {1, ...,n} is then constructed by 
assigning an edge between i and j (i < j) with probability 
n(xi, Xj)/n, independently of all the other edges in the graph. 

Consider random graphs G v (n, n) for which the kernel k 
is bounded and continuous a.e. on S x S. In fact, in this study 
it suffices to consider only the cases where the metric space 
S consists of finitely many points, i.e., S = {1, . . . , r}. Under 
these assumptions, the kernel k reduces to an r x r matrix, and 
G v (n, k) becomes a random graph with vertices of r different 
types; e.g., vertices with/without Facebook membership, etc. 
Two nodes (in G v (n, k)) of type i and j are joined by an edge 
with probability n _1 K(z, j) and the condition OTI ) reduces to 



fii p 

> Hi 

n 



(32) 



(28) 



where rii stands for the number of nodes of type i and /^ is 
equal to 

As usual, the phase transition properties of G v (n, k) can be 
studied by exploiting the connection between the component 
structure of the graph and the survival probability of a related 
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branching process. In particular, consider a branching process 
that starts with an arbitrary vertex and recursively reveals the 
largest component reached by exploring its neighbors. For 
each i = 1, . . . , r, we let p{n; i) denote the probability that the 
branching process produces infinite trees when it starts with a 
node of type i. The survival probability p(n) of the branching 
process is then given by 



(33) 



In analogy with the classical results for ER graphs [33], it 
is shown |38| that p(n;i)'s satisfy the recursive equations 



p(n;i) = l-exp< - ^ «(£, ' P( K >3) 



(34) 

The value of p(k) can be computed via d33l by characterizing 
the stable fixed point of (l34l reached from the starting point 
P(k; 1) = • • • = p(k; r) = 0. It is a simple matter to check 
that, with M denoting anrxr matrix given by M(i,j) = 
K,(i,j) ■ pj, the iterated map (|34l l has a non-trivial solution 
(i.e., a solution other than p(n\ 1) = • • ■ = p(n; r) = 0) iff 

a(M) :— max{|Ai| : A^ is an eigenvalue of M} > 1. (35) 

Thus, we see that if the spectral radius of M is less than or 
equal to one, the branching process is subcritical with p(n) = 
and the graph G v (n, n) has no giant component; i.e., we 
have that Ci(G v (n, k)) = o(n) whp. 

On the other hand, if a{M) > 1, then the branching process 
is supercritical and there is a non-trivial solution p(n\ i) > 
0,i = l,...,r that corresponds to a stable fixed point of 
( |34l . In that case, p(n) > corresponds to the probability 
that an arbitrary node belongs to the giant component, which 
asymptotically contains a fraction p(n) of the vertices. In other 
words, if a(M) > 1, we have that d(G v (n,/t)) = fi(n) 
whp, and ±Ci(G v (ra, k)) 4 p{n). 

Bollobas et al. [38, Theorem 3.12] have shown that the 
bound Ci(G v (n, k)) = o(n) in the subcritical case can be 
improved under some additional conditions: They established 
that whenever sup, j j) < oo and a(M) < 1, then we 
have Ci(G v (n, k)) = O(logn) whp as in the case of ER 
graphs. They have also shown that if either sup^ j < oo 

or inf i t j fc(i, j) > 0, then in the supercritical regime (i.e., 
when a(M) > 1) the second largest component satisfies 
C 2 (G v (n, K )) = O(logn) whp. 



B. A Proof of Theorem 13.31 

We start by studying the information spread over the net- 
work H when information transmissibilities T w and Tf are 
both equal to 1. Clearly, this corresponds to studying the 
phase transition in H = H(n; a, X w , A/), and we will do so 
by using the techniques summarized in the previous section. 
Let S — {1,2} stand for the space of vertex types, where 
vertices with Facebook membership are referred to as type 1 
while vertices without Facebook membership are said to be of 
type 2; notice that this is different than the case in the proof 



of Theorem 13.11 where we distinguish between different link 
types. In other words, we let 

1 if i e Af F 

2 if i g J\f F 

for each i = 1, ...,n. Assume that the metric space 5 is 
equipped with a probability measure p that satisfies the condi- 
tion (132k i.e., /i({l}) := p\ = a and p({2}) := p^ = 1 — a. 
Finally, we compute the appropriate kernel k such that, for 
each i,j — {1,2}, n(i,j)/n gives the probability that two 
vertices of type i and j are connected. Clearly, we have 



«(1,1) =nh- 
whereas k(1, 2) 



V n / V on 

K (2,l) = «(2,2) 



X w + 



A,, 



We are now in a position to derive the critical point of 
the phase transition in H(n; a, X w , A/) as well as the giant 
component size Ci(H(n; a, X w , A/)). First, we compute the 
matrix M(i,j) = K(i,j)pj and get 



M = 



aX h 



+ X f - 
aX n , 



(1 
(1 



a)X h 
a)X v 



It is clear that the term 



x m x 



has no effect on the results as 



we eventually let n go to infinity. It is now a simple matter to 
check that the spectral radius of M is given by 



a(M) = i(A / + A tu + A /(A / + X w ) 2 ~ 4i I - n )X f X, 



This leads to the conclusion that the random graph 
l(n; a, X W) Xf) has a giant component if and only if 

~ Uf + X w + ^(A / + A lu ) 2 -4(l-a)A / A tu ^ > 1 (36) 



as we recall (l35l ). If condition d36*b is not satisfied, then 
we have C±(M(n; a, X w , A/)) = O(logn) as we note that 
supj j < oo. From (|38] Theorem 3.12], we also get that 

C2(H(n; a, X w , A/)) = O(logn) whenever (f36t is satisfied. 

Next, we compute the size of the giant component whenever 
it exists. Let p(n; 1) = p\ and p(n; 2) = p^. In view of 
(l33l l and the arguments presented previously, the asymptotic 
fraction of nodes in the giant component is given by 



p(k) = api + (1 - a)p 2 , 



(37) 



where p\ and p% constitute a stable simultaneous solution of 
the transcendental equations 



pi = 1 — exp {— p\{a,X 



p2 = 1 - exp 



{-pi(aX w + X f ) - p 2 (l - a)X w } 
{-piaX w - p 2 (l - a)X w } 



(38) 



So far, we have established the epidemic threshold and the 
size of the information epidemic when T w = Tf = 1. In 
the more general case where there is no constraint on the 
transmissibilities, we see that the online social network F 
becomes an ER graph with average degree TfXf, whereas 
the physical network W becomes an ER graph with average 
degree T W X W . Therefore, the critical threshold and the size of 
the information epidemic can be found by substituting 1/A/ 
for Xf and T W X W for Au> in the relations (l36l l. (|37| > and d38l . 
This establishes Theorem 13.31 ■ 
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VIII. Conclusion 

In this paper, we characterized the critical threshold and 
the asymptotic size of information epidemics in an overlaying 
social-physical network. To capture the spread of information, 
we considered a physical information network that character- 
izes the face-to-face interactions of human beings, and some 
overlaying online social networks (e.g., Facebook, Twitter, 
etc.) that are defined on a subset of the population. Assuming 
that information is transmitted between individuals according 
to the SIR model, we showed that the critical point and the size 
of information epidemics on this overlaying social-physical 
network can be precisely determined. 

To the best of our knowledge, this study marks the first work 
on the phase transition properties of conjoint networks where 
the vertex sets are neither identical (as in 0, |[25l ) nor disjoint 
(as in ll27l ). We believe that our findings here shed light on 
the further studies on information (and influence) propagation 
across social-physical networks. 
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Appendix 

Information Diffusion with Multiple Online 
Social Networks 

So far, we have assumed that information diffuses amongst 
human beings via only a physical information network W 
and an online social network F. To be more general, one 
can extend this model to the case where there are multiple 
online social networks. For instance assume that there is 
an additional online social network, say Twitter, denoted by 
T(n; a t ) whose members are selected by picking each node 
1, ...,n independently with probability a t . In other words, 
with A/t denoting the set vertices of T, we have 

P [i € A/t] = oet, i = l,...,n. 
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To be consistent with this notation, we assume that the 
members of the online social network F (i.e., Facebook) are 
selected by picking each node 1, . . . , n independently with 
probability a/. 

The overlaying social-physical network now consists of a 
network formed by conjoining W, F and T; i.e., we have 
H = W U F U T. To demonstrate the applicability of the 
techniques for this general set-up, we now consider a simple 
example where W, F and T are all ER graphs with edge 
probabilities given by -^j^, and respectively. This 
yields an asymptotic mean degree of X w , A/ and At, in the 
networks W, F and T, respectively. For the time being, assume 
that the transmissibilities T w , Tt and T t are all equal to one. 

Now, recall the concept of inhomogeneous random graphs 
presented in Section IVII-AI Let S = {1,2,3,4} stand for 
the space of vertex types, where vertices with Facebook and 
Twitter membership are referred to as type 1, vertices with 
Facebook membership but without Twitter membership are 
referred to as type 2, vertices with Twitter membership but 
without Facebook membership are referred to as type 3, and 
finally vertices with neither Facebook nor Twitter membership 
are said to be of type 4. That is, we set 

if i G Nf and i E Nt 

if i £ Nf and i <j£ Nt 

if i $ Nf and i e Nt 

if i £ Nf and i ^ Nt 

for each i = l,...,n. Assume that the metric space S is 
equipped with a probability measure p that satisfies condition 
d32j; i.e., p x = a f a t , p 2 = a/(l - a t ), ^3 = (1 - a/)a t , 
and /i4 = (1 — a/)(l — a t ). The next step is to compute the 
appropriate kernel k such that, for each i,j = {1,2,3,4}, 
K,(i,j)/n gives the probability that two vertices of type i and 
j are connected. For n large, it is not difficult to see that we 
have 



A, 
A„ 
A„ 
A„ 



a f 

Ot f 

v 



A ( , 
A„ 



(V f 



A 1, 
A„ 



X w + 



X w + 



A„ 



A* 

nt 

At 

nt 



A 1, 
A 1, 
A 1, 
A„ 



M 



(A.l) 



The matrix M(i,j) = is now given by 

A w cefat + A/at + Ata/ X w ajat + A/at 
A„a/a t + A/a t \ w afa t + A/a t 

A t a/ X w afa t 
\ w afOi t 

f \ t o>f X w afa t 
\ w afat 
f A/at \ w ajat 
X w afat 

and the critical point of the phase transition as well as the 
giant component size of H(n; a/, at, X w , A/, At) can now be 
obtained by using the arguments of Section [VII- Al An item of 
information originating from a single node in H = WUFUT 
can reach a positive fraction of the individuals only if the 
spectral radius of M is greater than unity. If it is the case 
that (t(M) < 1, then there is no information epidemic and all 
information outbreaks have size O(logn). 



X w af at 4 
A„,a/at 

Au,a/a t 
Au,a/a t 
Au,a/at 
A„,a/at 



The fractional size of the giant component (i.e., information 
epidemic) can also be found. Recalling d33l ) and d34l i. we see 
that 

-Ci (H(n; a/, a t , X w , A/, At)) 

A a/a t pi + a/(l - a t )p 2 + (1 - a/)a t p 3 

+ (1 - a/)(l - a t )p 4 , (A.2) 

where < pi, P2, P3, P4 < 1 are given by the largest solution 
to the recursive relations 

Pi = l-e X p j-^M(i,j)pA, j = l,2,3,4. (A.3) 

In the case where there is no constraint on the transmissi- 
bilities T w , T f and T t , the conclusions 1A.U . (IA.2I ) and (IA.3b 
still apply if we substitute T W X W for A„,, Tf A/ for A/ and 
T t At for A t . 



